Skip to content

Conversation

@Giuseppe5
Copy link

@Giuseppe5 Giuseppe5 commented Oct 30, 2025

The goal of this PR is to allow any user to easily change how quantization is applied during the fine tuning process.
A longer description of the issue can be found in #3521

By making _prepare_for_qat a staticmethod, it is possible to replace it like so:

def my_quant_func(...):
    ...
    return model

FastLlamaModel._prepare_for_qat = my_quant_func

As mentioned in the issue above, even allowing for a custom quantization function (and thus, custom quantizer like Brevitas), does not solve all the issues, since Unsloth makes a strong assumption about the names of the quantization functions for weights and activations.

This can be patched without necessarily changing anything within Unsloth, but I believe it might be worth thinking about a more general implementation for the naming conventions (either a task for this or another PR, you tell me).

cc @Datta0

@danielhanchen
Copy link
Contributor

Thanks! This is ok, but we'll need to confirm if this might cause some issues downstream

@danielhanchen
Copy link
Contributor

@rolandtannous Can you check if this causes any issues? Thanks

@danielhanchen
Copy link
Contributor

@gemini-code-assist review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a flexible way to apply custom quantization schemes by exposing _prepare_for_qat as a replaceable method on FastLlamaModel and FastModel. This is a great addition for extensibility and a good first step towards the goal described in issue #3521.

However, I've found critical issues in the implementation of the new methods in both unsloth/models/llama.py and unsloth/models/loader.py. The method signatures are incorrect for the decorators used (@staticmethod and @classmethod), which will lead to TypeError exceptions at runtime. My review comments include specific suggestions to fix these issues.

Regarding the naming conventions for quantization functions you mentioned, that's a valid concern for a more general solution. I agree it could be addressed in a follow-up PR to keep this one focused.

Once the signature issues are addressed, this PR will be a solid improvement.

class FastLlamaModel:

@staticmethod
def _prepare_for_qat(cls, model, qat_scheme):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The signature of the _prepare_for_qat static method is incorrect. A staticmethod does not receive the class as an implicit first argument. The cls parameter will cause a TypeError at runtime because the method will be called with two arguments (model, qat_scheme), but it expects three. To fix this, the cls parameter should be removed.

Suggested change
def _prepare_for_qat(cls, model, qat_scheme):
def _prepare_for_qat(model, qat_scheme):

Comment on lines +573 to +576
@classmethod
def _prepare_for_qat(model, qat_scheme):
model = _prepare_model_for_qat(model, qat_scheme)
return model

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This method is decorated with @classmethod, but its signature (model, qat_scheme) doesn't account for the implicit cls argument that class methods receive, which will cause a TypeError. The first argument is also confusingly named model when it would receive the class object.

Since this method doesn't use the class object, and for consistency with FastLlamaModel, it should be a @staticmethod with the correct signature.

Suggested change
@classmethod
def _prepare_for_qat(model, qat_scheme):
model = _prepare_model_for_qat(model, qat_scheme)
return model
@staticmethod
def _prepare_for_qat(model, qat_scheme):
model = _prepare_model_for_qat(model, qat_scheme)
return model

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to provide a way for users to customize the quantization process by exposing _prepare_for_qat as a replaceable method. This is a valuable addition for flexibility. However, I've found critical issues in the implementation of the new methods in both unsloth/models/llama.py and unsloth/models/loader.py. The method signatures are incorrect and will lead to TypeError exceptions at runtime. I have provided suggestions to fix these bugs. I've also pointed out a minor style issue regarding an unnecessary blank line.

Comment on lines +1791 to +1793
def _prepare_for_qat(cls, model, qat_scheme):
model = _prepare_model_for_qat(model, qat_scheme)
return model

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The signature for the new static method _prepare_for_qat is incorrect. A static method does not receive the class as its first argument. The cls parameter will cause a TypeError at runtime because it will be populated with the model object from the call site, and the model parameter will receive qat_scheme, leaving the qat_scheme argument unfulfilled.

To fix this, you should remove the cls parameter from the method signature.

Suggested change
def _prepare_for_qat(cls, model, qat_scheme):
model = _prepare_model_for_qat(model, qat_scheme)
return model
def _prepare_for_qat(model, qat_scheme):
model = _prepare_model_for_qat(model, qat_scheme)
return model

Comment on lines +573 to +576
@classmethod
def _prepare_for_qat(model, qat_scheme):
model = _prepare_model_for_qat(model, qat_scheme)
return model

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The new _prepare_for_qat method is decorated with @classmethod, but its signature is incorrect. The first argument to a class method is the class itself. Here, it's confusingly named model, and the actual model object is passed as the second argument, which is received by the qat_scheme parameter. This will lead to a TypeError at runtime.

For consistency with the changes in FastLlamaModel and to ensure correct behavior, this should be a staticmethod with the correct signature.

Suggested change
@classmethod
def _prepare_for_qat(model, qat_scheme):
model = _prepare_model_for_qat(model, qat_scheme)
return model
@staticmethod
def _prepare_for_qat(model, qat_scheme):
model = _prepare_model_for_qat(model, qat_scheme)
return model

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants